In [1]:
import graphlab

In [2]:
# Limit number of worker processes. This preserves system memory, which prevents hosted notebooks from crashing.
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1477282007.log
This non-commercial license of GraphLab Create for academic use is assigned to sudhanshu.shekhar.iitd@gmail.com and will expire on September 18, 2017.

Load a common image dataset


In [3]:
image_train = graphlab.SFrame('image_train_data/')

In [4]:
image_test = graphlab.SFrame('image_test_data/')

Exploring the image data


In [5]:
graphlab.canvas.set_target('ipynb')

In [6]:
image_train['image'].show()


Train a classifier on the raw image pixels


In [7]:
raw_pixel_model = graphlab.logistic_classifier.create(image_train, target='label', features=['image_array'])


PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.
Logistic regression:
--------------------------------------------------------
Number of examples          : 1914
Number of classes           : 4
Number of feature columns   : 1
Number of unpacked features : 3072
Number of coefficients    : 9219
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1         | 6        | 0.000011  | 2.204523     | 0.330199          | 0.285714            |
| 2         | 8        | 1.000000  | 2.845201     | 0.376176          | 0.351648            |
| 3         | 9        | 1.000000  | 3.171270     | 0.413271          | 0.406593            |
| 4         | 10       | 1.000000  | 3.501081     | 0.427900          | 0.395604            |
| 5         | 11       | 1.000000  | 3.819451     | 0.432079          | 0.439560            |
| 6         | 12       | 1.000000  | 4.120828     | 0.436259          | 0.318681            |
| 10        | 17       | 1.000000  | 5.574198     | 0.516196          | 0.450549            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.

Make prediction with this simple model


In [8]:
image_test[0:3]['image'].show()



In [9]:
image_test[0:3]['label']


Out[9]:
dtype: str
Rows: 3
['cat', 'automobile', 'cat']

In [10]:
raw_pixel_model.predict(image_test[0:3])


Out[10]:
dtype: str
Rows: 3
['bird', 'cat', 'bird']

Evaluating raw pixel model on test data


In [11]:
raw_pixel_model.evaluate(image_test)


Out[11]:
{'accuracy': 0.469, 'auc': 0.7152721666666683, 'confusion_matrix': Columns:
 	target_label	str
 	predicted_label	str
 	count	int
 
 Rows: 16
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |     bird     |       cat       |  116  |
 |     dog      |    automobile   |  118  |
 |     dog      |       dog       |  464  |
 |     cat      |       dog       |  346  |
 |     cat      |       cat       |  253  |
 |  automobile  |    automobile   |  663  |
 |     bird     |    automobile   |  158  |
 |     cat      |    automobile   |  185  |
 |     dog      |       bird      |  230  |
 |  automobile  |       bird      |   87  |
 +--------------+-----------------+-------+
 [16 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'f1_score': 0.4606416683537964, 'log_loss': 1.230523503428094, 'precision': 0.46059547850243937, 'recall': 0.46900000000000003, 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 	class	int
 
 Rows: 400004
 
 Data:
 +-----------+-----+-----+------+------+-------+
 | threshold | fpr | tpr |  p   |  n   | class |
 +-----------+-----+-----+------+------+-------+
 |    0.0    | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   1e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   2e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   3e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   4e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   5e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   6e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   7e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   8e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 |   9e-05   | 1.0 | 1.0 | 1000 | 3000 |   0   |
 +-----------+-----+-----+------+------+-------+
 [400004 rows x 6 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}

Can we improve the models with the Deep Features?


In [12]:
len(image_train)


Out[12]:
2005

In [14]:
deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')


Downloading http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45/dir_archive.ini to /var/tmp/graphlab-sud/20490/479056c6-76a1-4114-8253-3971070fde34.ini
Downloading http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45/objects.bin to /var/tmp/graphlab-sud/20490/161f60a8-99a6-4284-bf97-4a6e2761804c.bin

In [ ]:
image_train['deep_features'] = deep_learning_model.extract_features(image_train)


Images being resized.

In [ ]:
image_train.head()

In [ ]:
deep_feature_model = graphlab.logistic_regression.create(image_train, target='label', features=['deep_features'])

In [ ]:
deep_feature_model.predict(image_test[0:3])

In [ ]:
deep_feature_model.evaluate(image_test)

In [ ]: